An Algorithm for Estimating all MatchesBetween Two
نویسندگان
چکیده
We give a randomized algorithm for estimating the score vector of matches between a text string of length N and a pattern string of length M; this is the vector obtained when the pattern is slid along the text, and the number of matches is counted for each position. The randomized algorithm takes deterministic time O((N=M)Conv(M)) where Conv(M) is the time for performing a convolution of two vectors of size M each. The algorithm nds an unbiased estimator of the scores, whose variance is particularly small for scores that are close to M, i.e., for approximate occurrences of the pattern in the text. No assumptions are made about the probabilistic characteristics of the input, or about the number of diierent symbols appearing in T or P (i.e., the alphabet size need not be much smaller than M). The solution extends to the weighted case and to higher dimensions. Un algorithme pour l'estimation des co ncidences entre deux cha^ nes R esum e : Nous donnons un algorithme randomis e pour l'estimation du vecteur score des co ncidences entre un texte de longueur N et un motif de longueur M ; ce vecteur est obtenu en faisant glisser le motif le long du texte et en comptant le nom-bre de co ncidences a chaque position. L'algorithme randomis e a une complexit e de O((N=M)Conv(M)) en temps d eterministe, o u Conv(M) est la complexit e en temps pour r ealiser une convolution entre deux vecteurs de taille M. L'algorithme calcule un estimateur des scores qui est non biais e et dont la variance est partic-uli erement petite pour des scores proches de M, i.e., pour les occurrences bien ap-proch ees du motif dans le texte. Aucune hypoth ese n'est faite sur les caract eristiques probabilistes de l'entr ee, ni sur le nombre de symboles dii erents apparaissant dans le texte et le motif (i.e., la taille de l'alphabet ne n ecessite pas d'^ etre plus petite que M). L'algorithme s' etend au cas pond er e et aux plus grandes dimensions. Abstract We give a randomized algorithm for estimating the score vector of matches between a text string of length N and a pattern string of length M ; this is the vector obtained when the pattern is slid along the text, and the number of matches is counted for each position. The randomized algorithm takes deterministic time O((N=M)Conv(M)) where Conv(M) …
منابع مشابه
An EM Algorithm for Estimating the Parameters of the Generalized Exponential Distribution under Unified Hybrid Censored Data
The unified hybrid censoring is a mixture of generalized Type-I and Type-II hybrid censoring schemes. This article presents the statistical inferences on Generalized Exponential Distribution parameters when the data are obtained from the unified hybrid censoring scheme. It is observed that the maximum likelihood estimators can not be derived in closed form. The EM algorithm for computing the ma...
متن کاملEstimating Land Surface Temperature in the Central Part of Isfahan Province Based on Landsat-8 Data Using Split- Window Algorithm
Land surface temperature (LST) is used as one of the key sources to study land surface processes such as evapotranspiration, development of indexes, air temperature modeling and climate change. Remote sensing data offer the possibility of estimating LST all over the world with high temporal and spatial resolution. Landsat-8, which has two thermal infrared channels, provides an opportunity for t...
متن کاملA New Similarity Measure Based on Item Proximity and Closeness for Collaborative Filtering Recommendation
Recommender systems utilize information retrieval and machine learning techniques for filtering information and can predict whether a user would like an unseen item. User similarity measurement plays an important role in collaborative filtering based recommender systems. In order to improve accuracy of traditional user based collaborative filtering techniques under new user cold-start problem a...
متن کاملAn Improved Big Bang-Big Crunch Algorithm for Estimating Three-Phase Induction Motors Efficiency
Nowadays, the most generated electrical energy is consumed by three-phase induction motors. Thus, in order to carry out preventive measurements and maintenances and eventually employing high-efficiency motors, the efficiency evaluation of induction motors is vital. In this paper, a novel and efficient method based on Improved Big Bang-Big Crunch (I-BB-BC) Algorithm is presented for efficiency e...
متن کاملEstimating the Parameters in Photovoltaic Modules: A Constrained Optimization Approach
This paper presents a novel identification technique for estimation of unknown parameters in photovoltaic (PV) systems. A single diode model is considered for the PV system, which consists of five unknown parameters. Using information of standard test condition (STC), three unknown parameters are written as functions of the other two parameters in a reduced model. An objective function and ...
متن کامل